Skip to content

Feat/gemma4 adapters#1385

Draft
huseyincavusbi wants to merge 24 commits into
TransformerLensOrg:devfrom
huseyincavusbi:feat/gemma4-adapters
Draft

Feat/gemma4 adapters#1385
huseyincavusbi wants to merge 24 commits into
TransformerLensOrg:devfrom
huseyincavusbi:feat/gemma4-adapters

Conversation

@huseyincavusbi

Copy link
Copy Markdown
Contributor

Description

This PR adds TransformerBridge support for the Gemma 4 model family (E2B, E4B, 26B-A4B, and 31B) through a single unified Gemma4ArchitectureAdapter.

Key Implementation Details

  • Unified Adapter (gemma4.py): Dynamically handles all 4 variants by evaluating initialization configuration flags:
    • MoE Blocks: Submodules conditionally spin up only when enable_moe_block=True (specifically for the 26B variant).
    • KV-Sharing: Dropped gracefully when num_kv_shared_layers > 0 (for E2B/E4B).
    • PLE Embeddings: Surfaced dynamically when hidden_size_per_layer_input > 0.
    • Weight Processing: Maps and converts Gemma 4's joint QKV layout, dual RoPE configurations, alternating sliding/full attention mechanisms, logit softcapping, and RMSNorm.
    • Includes 45 dedicated unit tests verifying config attributes, MoE behavior, and weight conversions.
  • Shared-Library Updates (3 files, fully opt-in, zero regressions on existing adapter tests):
    1. position_embeddings_attention.py: Applies V norm post-reshape (Gemma 4 is the first architecture featuring per-head value normalization). Handles KV-sharing delegation to Hugging Face's original attention implementation when K/V submodules are omitted. Caches computed KV states in shared_kv_states post-RoPE for structural layer reuse.
    2. bridge.py: Introduces a use_native_generate opt-in flag. This bypasses a current Hugging Face transformers dev-version issue where eager attention causes a KV-cache dimension mismatch during generation. Setting this flag (scoped strictly to this adapter) delegates processing to HF's native generate() utilizing SDPA.
    3. main_benchmark.py: Fixes pad_token_id assignment when eos_token_id is a list (Gemma4 uses [1, 106]), taking the first element.

Verification & Performance

All models have been validated.

Fixes #1297

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

…utes

- Unwrap text_config for Gemma4ForConditionalGeneration models
- Read PLE, KV sharing, layer_types, softcapping from text_cfg
- Add NotImplementedError guard for MoE variants (26B-A4B)
- Update tests to exercise text_config path
@huseyincavusbi huseyincavusbi marked this pull request as draft June 14, 2026 10:49
@jlarson4 jlarson4 changed the base branch from main to dev June 15, 2026 15:47

@jlarson4 jlarson4 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @huseyincavusbi glad to finally see this come through. I have a couple comments that exist below, take a look when you have a moment and let me know what you think.

Additionally, @punishell has recently opened #1377, which is a parallel implementation of Gemma4. I'd like to include bits of both your implementations where it makes sense & is relevant. They came up with a very straight forward solution for the KV-cache issue that might be of use to you, if you want to try rebasing your work onto theirs as an extension point. I am thinking there may be a way to use their DelegatedAttentionBlockBridge in combination with your work spent on adding support for Gemma4 to position_embeddings_atttention to provide even better overall support.

There are more moving parts here than anticipated, if you have questions please feel free to ask.


import pytest

from transformer_lens.config.TransformerBridgeConfig import TransformerBridgeConfig

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you began this PR, the structure of the TrasnformerBridgeConfig import path was adjusted due to a name conflict introduced in an related change refactor. Please update this to

from transformer_lens.config import TransformerBridgeConfig

# with a specific transformers version). Set self.cfg.use_native_generate = True
# in the adapter's __init__.
if getattr(self.cfg, "use_native_generate", False):
return self.hf_generate(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This delegation is dropping potential kwargs that a user may pass in. stop_at_eos, prepend_bos, padding_side, freq_penalty, use_past_kv_cache, as well as the new stop_strings/stopping_criteria add in #1374 to name a few. Someone using Gemma4 who calls calling generate(..., stop_strings=".") would have it silently ignored.

If you end up opting to keep use_native_generate, we will need to make sure all relevant kwargs are properly passed thorough

return self.hf_generate(input, **hf_kwargs)

# Adapters can opt-in to delegating generation to HF's native generate()
# (e.g. when the bridge's custom attention has a KV-cache incompatibility

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does no-cache hooked generation (use_past_kv_cache=False) work for Gemma4 with this incompatibility? If so, that's a better stopgap than delegating to hf_generate. It preserves hooks and lets you drop the use_native_generate flag. If you could dig into that and let me know what you find, I'd appreciate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Proposal] Gemma4 Architecture Adapter

2 participants